A Survey on Deep Packet Inspection for Intrusion Detection Systems 
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Abstract 

Deep packet inspection is widely recognized as a pow- 
erful way which is used for intrusion detection systems 
for inspecting, deterring and deflecting malicious at- 
tacks over the network. Fundamentally, almost intru- 
sion detection systems have the ability to search through 
packets and identify contents that match with known at- 
tacks. In this paper, we survey the deep packet inspec- 
tion implementations techniques, research challenges 
and algorithms. Finally, we provide a comparison be- 
tween the different applied systems. 
Key words: Deep packet inspection, intrusion detec- 
tion system, network security, algorithms. 



1 Introduction 

The enormous attacks from the Internet like viruses, 
spam, software vulnerabilities and many of attacks 
spots make protection methods an important way to 
prevent and save the human efforts from destruction. 
Therefore, a variety of methods have been used to pro- 
tect data. These methods began with using cryptog- 
raphy, policies, firewalls, IDS and finally with intrusion 
prevention systems (IPS) [Hj. IDS and IPS are con- 
sidered as the second defense line against the outsider 
attack which do not know the cryptographic informa- 
tion. Besides, they work as the first defense line against 
insider attacks who can bypass the cryptographic sys- 
tem. 

The DPI is a core component for many systems 
plugged in the network including proxies, packet fil- 
ters, sniffers, IDS, and IPS. Network components use 
DPI as an essential inspector where it is applied in 
different layers of the OSI model. Unlike the early be- 
ginnings of using DPI where it was applied in only one 
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layer depending on the header (e.g., proxies and fire- 
walls etc.), nowadays, layer-independent attacks force 
us to inspect attacks in all the layers. According on 
the intrusion detection literature, efforts to obtain a 
fast implementation can be categorized into two main 
categories [3T|: (1) design of an efficient data structure 
with optimized memory access rate, and (2) design of 
high throughput algorithm to process intruder signa- 
ture. 

In this paper, we survey the deep packet inspection 
algorithms and their usage in the several existing tech- 
nologies which are used for intrusion detection systems. 
The rest of this paper is organized as follows: section[2] 
introduces an overview on the challenges and goals (or 
simply objectives) of using the deep packet inspection 
for efficient intrusion detection systems. Section [3] and 
section [4] introduce both the software and hardware 
implementations of DPI systems, respectively. Section 
[5] overviews the finite state machine, section [5] intro- 
duces a comparison between the existing technologies 
and architectures, and finally section [7] draws conclud- 
ing remarks. 

2 Challenges and Goals 

The design and implementation of the deep packet 
inspection has several challenges which harden the its 
advancement process. Also, there are several ultimate 
goals and design objectives that are always considered 
when we make a new DPI design. In this section, we 
list the different challenges and design objects. 

2.1 Deep Packet Inspection Challenges 

When the DPI becomes mean to detect the intru- 
sion, there are several challenges related to applying it 
on the network. In the following, we summarize these 
challenges. 

1. The search algorithm complexity: the com- 
plexity of the algorithm and the operations of 



comparison against the signatures of intruder de- 
crease the throughput of the system. Thus, search 
algorithms are the main focus point in DPI re- 
searches, whereas matching process is resource 
consuming. For example, the string matching rou- 
tines in SNORT [35] account for up to 70% of total 
execution time and 80% of instructions executed 
on real traces [I]. 

2. Increasing number of intruder signature: ac- 
cording to the verity of attacks, the needs for new 
intruder signature increase. Therefore, the large 
number of signatures makes the task of IDS harder 
whereas the matching process must inspect traffic 
against all attacks fingerprints. 

3. The overlapping of signatures: the signatures 
of attacks usually are not general so the signa- 
tures can be categorized into groups according to 
common properties like protocol type. For exam- 
ple http packet in snort [35] has 1096 signatures. 
Therefore, there is a need for process the packets 
before matching process. 

4. The Location of signature unknown: due to 

verity types of attacks on different types of appli- 
cations, the pattern of intruders is not localized 
in specific place in the packet which means that 
the IDS must inspect all the payload of the packet 
against the attacker signatures. 

5. Encrypted Data: the data which is encrypted 
cannot be inspected by DPI. However, there are 
some solutions to overcome this problem by plug- 
ging the DPI component behind the decryption 
device. 

The DPI system as we mentioned before has many 
challenges and in the same time it have to provide the 
requirements for network need. There are two main 
requirements that should be satisfied on DPI system, 
more detail will be provided in subsection I2.2[ which 
is:(l) the high speed of processing the packets which 
affects the throughput of the system and manages the 
core speed of the network (10 Gbps-40 Gbps) and the 
edges speed (1 Gbps). (2) The low cost for DPI system 
as memory, and power consumptions. 

2.2 DPI Design Objectives 

DPI systems have to satisfied specific objectives to 
sustain the traffic rate and intrusion signatures growth. 
Hence, we conclude some objectives which have to sat- 
isfy in DPI architecture as following [45 [40 : 




Figure 1. DPI implementations 



Deterministic performance: the architecture 
has to operate and process traffic stream indepen- 
dently of signature characteristics or traffic char- 
acteristics. So, the system has to manage traffic 
in worst case in software and hardware based sys- 
tems. 

Memory efficiency: memory access time is one 
of the main bottlenecks in DPI system in software 
implementations meanwhile, it is critical in hard- 
ware design as access time and memory scarcity. 
Thus, high memory efficient design is preferable. 

Dynamic update: this objective is very impor- 
tant in hardware based design to add and remove 
intruder signature to system without affect system 
operation. 

Signatures: DPI system support fixed intruder 
patterns and regular expression. Also, the system 
can deal with all types of intruder patterns [213] 
which we will illustrate in the literature in section 
PI 



5. Scalability: scalability is not big issue in software 
based system. On the other hand, it is critical in 
hardware based systems. Thus, hardware design 
has to support unlimited number of signatures. 

6. Additional functions: DPI system can support 
another function like; multi traffic's sessions in- 
spected separately, not only inspect the intruders 
but also allocate it, and customize signatures sub- 
sets or entire signature to inspect. 

3 Software Deep packet Inspection sys- 
tems 

There are many packet scanning applications that 
require deep packet inspections. Here, we review three 
popular ones: SNORT [35], Bro [10] and Linux L7- 
filter [25]. SNORT and Bro are two popular intrusion 
detections systems, while L7-filter is an application for 



application layer protocols analysis which makes packet 
classification based on application layer data. These 
systems are all open source systems, which allow us to 
perform a detailed analysis and show their abilities and 
constraints. 

3.1 SNORT Intrusion Detection System 

SNORT is an open source intrusion detection sys- 
tem which used for protocol analysis and full packet 
inspection against intruder signature. The SNORT sys- 
tem processes the traffic of packets on multi stages as 
illustrated in Figured] 07]. SNORT system and all 
common IDS use method called analyze-normalized- 
matching(ANM) [32]. SNORT use many string match- 
ing algorithms, on of them is Boyer Moore (BM) algo- 
rithm which we will talk about it in literature about 
matching algorithms in section I4TT1 SNORT rule may 
contain header and content fields where the header part 
checks the protocol, source and destination IP address 
and port, and the content part scans packets payload 
for one or more patterns. Rules with more than one 
pattern are called correlated rules. Furthermore, rules 
can also contain negation patterns, which mean nega- 
tion of patterns stands for no occurrence of the pattern. 
The matching pattern may be in ASCII, HEX or mixed 
format. HEX parts are included between vertical bar 
symbols "j" as an example of a Snort rule is [35]: 

alert tcp any any -> 198.165.200.24/32 111 
(content: " idc j I 3a3b I j " ; msg: "mountd access";) 
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Figure 2. SNORT Process Stages 



4 Hardware Implementation 

As a need to speed up the inspection process, the 
hardware (HW) implementations always appear as a 
preferable solution for high speed DPI implementation. 
However, the different requirements for DPI provide 
limitations to perform the deep packet inspection in 
HW. The limitation refers to the large number of sig- 
nature, complexity and overlapping of signatures and 
finally the high rate of signature update and addition. 
Therefore, the HW solution has to satisfy the previous 
requirements by special properties which are as follows: 



1. Use of high degree of pipelining to support inspec- 
tion for large number of intruder patterns. 

2. The HW component must have high degree of 
processing capability to manage complex patterns 
with LAN speed (e.g., 10 Gbps). 

3. It must be configurable HW to be suitable for 
changing situation of intruder patterns. 

4. It must be design to be capable of update or add 
a new pattern without turning off the DPI com- 
ponent. 

The hardware implementation can be categorized 
into three depending on the used technologies in that 
implementation as follows: 

1. Ternary content addressable memory (TCAM) im- 
plementation [?T] 

2. Field-programmable gate array (FPGA) imple- 
mentation [17) 

3. Multi-core processors [22] 

However, each implementation has its advantages and 
limitations which as we will see later when we detail 
each implementation. In general, multi-core proces- 
sors implementations are considered the best preferable 
among the implementations due to its programming 
flexibility. On the other hand, the TCAM is preferable 
when the speed is considered. 

4.1 Matching Algorithms 

The matching for pattern depends on the algorith- 
mic way to process the data and return the result of 
existence of the pattern or not in considerable time. 
Accordingly, many algorithms have been introduced to 
perform string matching. Though, the string matching 
algorithms always suffer from two factors that affect 
the throughput of processed data. The first factor is 
the computation operations to make comparison be- 
tween the pattern and the data and second is the num- 
ber of patterns that need to be compared with the traf- 
fic of the incoming data. Historically, the first string 
matching algorithm was the brute force (BF) algorithm 
which compares the first character in the pattern with 
the data stream. If the a single charter match, BF 
compares it with the next character of the pattern and 
so on. Finally, if the whole pattern is finished, it issues 
the pattern matching results. 

Later on, many algorithms appear to increase the 
performance of matching. These algorithms can be 



categorized according to the implementation as soft- 
ware based, HW based or mixture of both implemen- 
tations. Briefly, there are a lot of algorithms for pat- 
tern matching. However, the most famous software 
based algorithms are Knuth-Morris-Pratt (KMP) [23], 
Boyer-Moore (BM) 0, Aho-Corasick (AC) [T], AC.BM 
algorith [14], Wu-Manber [48], and Commentz Wal- 
ter (CW) [15] . We will summarize the concept be- 
hind selected algorithms and their implementation, de- 
sign, and applicability for DPI. On the other hand, 
most known HW based algorithms are the parallel 
Bloom Filters [17], CAM (content addressable mem- 
ory), TCAM, and finally FPGA implementations. 

KMP Algorithm: the Knuth-Morris-Pratt 
(KMP) algorithm [24] came as an enhancement for the 
brute force algorithm which was we introduced before 
as the early work for pattern matching. The improve- 
ment of KMP over the BF is performed by skipping 
characters when the mismatch occurs in the compar- 
ison phase. This skipping for characters depends on 
preprocessing phase of KMP to the patterns. The re- 
sult of the KMP is somehow similar to the finite au- 
tomata for patterns representation in which depending 
on every match and mismatch a certain jump over the 
input stream occurs. Additionally, KMP [24] and BM 
[9] algorithms are designed for single pattern searching. 

If the pattern length is m bytes, the complexity of 
the matching algorithm will be of 0(m + n) match- 
ing this pattern in an n bytes stream. If there are 
k patterns, the search time will be 0(k(m + n)) ac- 
cording to that the single search is performed k times. 
In [7], Baker and Prasanna implemented a hardware 
based DPI architecture for KMP algorithm to exploit 
the HW parallelism and reduce the complexity of the 
above bound. 



4.2 Bloom Filter 



The Bloom filter is a technique to generate a 
structure that compresses the pattern string as s 
hashed value. After that, the same hash function 
that produced the patterns is used to make the de- 
pendences from the input traffic. This method has 
been applied firstly in intrusion detection system by 
Dharamapurikar et al. [17j and his implementation 
was on FPGA. The system implementation achieves a 
throughput of 2.12Gbps. Bloom filters are very elegant 
in representing set membership, but have two potential 
drawbacks. First, they require multiple hash functions 
and memories, and second, they give an approximate 
match answer since they allow false positives. 



4.3 Content Addressable Memory 

Nowadays, the most popular HW techniques which 
are used in commercial packet inspection products are 
content addressable memory (CAM) 41]. The CAM is 
a special memory that makes parallel comparison for 
its contents against the input value and returns the ad- 
dress of match entry. Hence, the CAM is considerably 
fast and has many demanded properties such as high 
access speed near 4 nano-second, the search time com- 
plexity is 0(1) and bounded by a single memory access. 
However, CAM does not make longest prefix matching 
which is essential for many DPI patterns that have the 
same prefix. Therefore, it is suitable for deterministic 
fixed-length matching. 

Also, because of the above shortage of CAM, a new 
HW component was developed by the name of Ternary 
CAM (or simply, TCAM). TCAM memory stores the 
data with three logical values (i.e., 0, 1, ? don't care) 
and its circuit diagram construct as illustrated in Fig- 
ure 3(b) [4Tj . Furthermore, each entry stores the value 



which is considered to be intruder signature and entries 
arranged in descending index as illustrated in Figure 
3(a)] [41]. 



As a result of the previous properties, for CAM and 
additionally to Longest-Prefix Matching, TCAM be- 
came as backbone for many network devices that de- 
pend on packet inspection. For example routers and 
switches primarily use TCAMs to perform forwarding 
lookups for Internet Protocol addresses. TCAMs can 
be also used in devices that support packet classifi- 
cation, network address translation, route lookups in 
storage networks, layer 4 to layer 7 switching, server 
load balancing, label switching, high performance fire- 
wall functions and finally in network intrusion detec- 
tion system (NIDS) and network prevention system 
(NIPS) that depend on DPI techniques. 

However, TCAM has some general disadvantages 
which are as following [4"T] ; 

1. High cost per bit relative to other memory tech- 
nologies, it's about 30 times SRAM per bit. 

2. Storage inefficiency. 

3. High power consumption. It is about 180 times 
than SRAM per bit and the power consumption 
proportional with number of entities which has 
been searched on memory lookup. 

4. Limited scalability to long input keys. 

The special disadvantages for DPI are as follows [29] : 

1 . Range Representation Problem: TCAM can repre- 
sent prefix of patterns in easy way {e.g. "atta XX" 



catch any word start with atta and two letter after) 
but rang signature which catch sub-word and after 
arbitrary number of character catch the reminder 
sub-word consumes more entries in TCAM. 

2. Multi-match Classification Problem: Return back 
all the matching results of all matching entries 
of TCAM, not just the highest priority entry of 
TCAM. 

Bitwise CAM: In [5U], CAM hardware has been 
implemented based on a tree-based content address- 
able memory structure called "Bitwise CAM", which 
involves HW sharing at bit level in order to exploit 
powerful logic optimizations for multiple strings repre- 
sented as a Boolean expression. The design can run 
at a rate of approximately 2.5 Gbps per second, and 
is approximately 30% smaller in area when compared 
with published results. Also, authors functionalized 
the parallelism in the design of an extended system. 

4.4 TCAM implementations 

In literature of TCAM's contribution in DPI, Yu et 
al. [5D] have been the first to design scheme that deals 
with all types of intruder patterns which we will dis- 
cuss later. In [5D], they implement a scheme for IDS 
that handles the intruder's signatures with deeply anal- 
ysis to intruder's patterns. The scheme categorizes in- 
truder patterns into two types: complex patterns such 
as long patterns, patterns with negation (which means 
no existence of specific patterns on traffic) and corre- 
lated patterns (which means patterns separated with 
specific number of arbitrary characters). Additionally, 
there are another type which is a simple pattern. 

The work by Yu et al. discusses scheme and algo- 
rithms to deal with each type of pattern and how to 
plug it into TCAM. The scheme uses SRAM memory 
as partial hit list (PHL), which consider slow in access 
comparing to TCAM, to store detection of partial cor- 
related patterns encounter in traffic. Nonetheless, the 
scheme has bottleneck when the intruder intentionally 
send packet that make PHL access rate very high and 
then effect the system throughput. That is due the 
need of multi memory look up. 

According to the simulation, this scheme can be op- 
erated on 2 Gbps traffic. The implementation of Yu et 
al. in [5D] suggests lookup on TCAM entries for each 
new character. Thus, the input of n character requires 
the complexity of 0(n) lookup over TCAM. On the 
other hand, Jung et al. in [38] presented a scheme in 
which jump are made over the input traffic by window 
slide size m which is called jumping window scheme 
and match the intruder signature over single packet. 



It reduced the number of TCAM lookup over n input 
character to 0(n/m) and provided throughput of 10 
Gbps using 2,394 SNORT rules. Also, Sung et al. in 
|39j extended the jumping window scheme to work over 
multi packets intruder signatures. 

4.5 Multi-core Processors Implementa- 
tions 

Multi-core processors' implementations are prefer- 
able for designing IDS due to flexibility. However, 
multi-core processors still have limitation in number 
of processors and size of on-chip memory which affect 
efficiency of IDS implementations on it. In the fol- 
lowing, we will introduce a survey on a part of the 
efforts been performed to implement IDS on network 
processors (NP) which is a type of multi-core processor 
implementation. 

In [IB], Bruijn et al. developed the SafeCard des- 
gin which is a framework for network-based intrusion 
prevention at the network edge which is able to cope 
with all levels of abstraction and can be easily extended 
with new techniques. Furthermore, it is capable of re- 
constructing and scanning TCP streams at Gbps rates 
while preventing polymorphic buffer-overflow attacks. 

Additionally, the CardGuard by Bos et al. in [5J 
uses IXPI200 network processor as IDS and achieved 
few hundred Mbps Ethernet performances when scan- 
ning payloads of TCP connection. In [34], Singh et al. 
introduce Early-bird prototype which consists of sen- 
sor to detect attacks and aggregator for administrative 
reporting and control. Early-bird can cope with 200 
Mbps without packet dropping. 

In [12j . new work has been introduced by Chris et 
al. as a combination between IXP network processors 
and Xilinx Virtex FPGAs to build IDS. 

5 Finite State Machine 

One of the most important tools for the design of 
hardware implementation for the DPI is the finite state 
machine (FSM) . The FSM implementation is classified 
into two categories which are the deterministic finite 
automata (DFA) and nondeterministic finite automata 
(NFA). In this section, we introduce a survey of the 
research that has been performed on the FSM including 
the two categories. 

5.1 Nondeterministic Finite Automata 

Nondeterministic Finite Automata (NFA) is a di- 
rected graph which has nodes called states and labeled 
edges to connect the states. More specifically, the NFA 
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has initial state and one or more final states. Moreover, 
the edges can be labeled with single characters or null 
(</>) which mean that multiple states can be active si- 
multaneously in an NFA. The NFA is very useful in 
parallel processing because it can process input char- 
acter in multi branches of NFA and may output multi 
acceptance state for input on the contrary of DEA [21] . 

For its usability, there are many efforts to construct 
DPI systems which depend on NFA. In [33], Reetinder 
et al. were the first how to use the NFA to construct 
regular expressions in given text using FPGAs. To 
match a regular expression of length n, a serial machine 
requires 0(2 n ) memory and takes the time complexity 
of O(l) per text character. However, they proposed an 
approach that requires the 0(n 2 ) space and still pro- 
cess a text character in O(l) time (one clock cycle). 
Additionally, they presented a simple and fast algo- 
rithm that quickly constructs the NFA for the given 
regular expression. Fast NFA construction is crucial 
because the NFA structure depends on the regular ex- 
pression, which is known only at runtime. Further- 
more, in 13J, Clark et al. implemented FPGA based 
multi character decoder for DPI which based on NFA. 

5.2 Deterministic Finite Automata 

The Deterministic Finite Automata (DFA) consists 
of a finite set of input symbols (which are denoted as 
J2): a finite set of states, and a transition function to 
move from one state to the other denoted as d. In 
contrast of NFA, DFA has only one active state at any 
given time |21j . 

Regular Expression: The regular expression is 
required as a need for packet payload inspection to 
different protocols packets. It introduces a limited 
DPI system to deal with all packets structures. As 
the result of this limitation, state-of-art systems have 
been introduced to replace the string sets of intrusion 
signature with more expressiveness regular expression 
(regexp) systems. Therefore, there are several con- 



tent inspection engine which have partially or fully mi- 
grated to regexps including the those in Snort [35] , Bro 
[10], 3com's TippingPoint X506 02], SafeXcel [32], and 
Cisco systems' [23] • However, using the regexp to rep- 
resent patterns includes converting this regexp to De- 
terministic Finite Automata (DFA) [21]. This DFA is 
represented in the DPI systems as table. This table 
represents the states and transitions of DFA as records 
which mean that the expansion of memory table of 
DFA of regexp depends on the size of DFA. 

Experimentally, DFA of regexp that contains hun- 
dreds of pattern yields to tens of thousands of states 
which mean memory consumptions in hundreds of 
megabytes. As a solution of one of the common prob- 
lems of HW based DPI solutions is the memory access 
because the memory accesses for the contents of the 
off chip memory are proportional with the number of 
bytes in the packet. 

In [26], Kumar et al. noted that the implementa- 
tion for the regexps of intruder signatures consumes 
much memory and there should be a way that reduces 
the regexp memory consumption without increasing 
the number of memory lookup to operate DPI system 
which is considered an additional problem due to the 
related lookup delay. To reduce the memory access, 
they also introduced a delayed input DFA D 2 FA which 
tries to compact the traditional DFA for regexp accord- 
ing to that they note some states in DFA that had the 
same outgoing transition. For example, if there are two 
states sil, S2 that introduce transition to the same out- 
going set of stats (S) for set of input characters C, this 
transition can be eliminated from state si by default 
transition DT to S2- 

According to this assumption, the state si can main- 
tain all the transition of state S2 via state s\ and 
then passing to next state. D 2 FA constructs a com- 
pact DFA which decreases the memory consumption 
by DFA. However, compacting the memory represen- 
tation by default transition leads to manipulation of 
multiple default transition before going to the next 



Table 1. Comparison between Existing Architectures 



Algorithm / Component 


Implementation Device 


Throughput (Gbps) 


Pfi T'fi 11 o\ Rlnnm T^ilfnT'Q 17 

X CliL dUL^l XJ1UU111 X 11LI_<1 o 1 I 


FPCA XCV9000F 

X X V. I ^ i iv V.. V £j\J\J\J X_J 


2.46 


A Ti (~\-C^,r\r n tiicV PTI 


FPGA 


12.35 


TCAM 20 


TCAM 


2 


ATin-rinrflt;ir'V 44 
rillU v, vJI clio 1 <^iv rx XI 




8 
o 


TCAM/FPCA IT^I 


"X"il in v \fivfo^r c ) 

-/VllllXA. V 11 LI^jVZi 


1 

X VJ 


nnnnn /SR AM [?] 

1111111111/ lJXV/XxIVX IZjI 




14 


Selective multi-clictrSyCtcr trSyiisitions /FPGA |37] 


Xilin-sr XC2V6000-6 

yviiiiiA Zj v uuuu u 


14 


R-FSM/fTPCA nr ASTCl [45l 

1 J 1 kJlVX / 11 X V. I A (Jl rtul J l^ttjl 


"X"ll 1TTV \f\ T"f"P"Y"-4 
YVllllXA. V 11 LLA *± 




nnn /SR AM Rl 

111111/ kJX\,iT_lVX | O j 


FPCA /ASTC 


1—20 


RTCAM 46 

ill VJXllVl | ± V J | 


TCAM 


12.35 


Pre-Decoded CAM 36 


V 11 LCA £i UUUU 


Q 7 


Ouad Rloom Filter /FPCA If)] 

\c/ LI CIA 1 1J1UU111 X 111L1 /XX VJTJi U 


^CilinY V irtp"v4 

.i V1111LV V 11 LCA^t 


20.4 


RTTWTSF CAM W 

XJX X_ V V XkJX_J V^iT-lVX ItJwl 


FPCA Xilinx XC9V8000 

x x vjri yviiniA. jw_jLi v ouud 




FPCA fl8l 


Virtex- 4 


10 


UCLA Packet/FPGA [TTJ 


Xilinx Spartan 3-XC3S2000 


3.2 


NFA/(FPGA and IXP) [12 


Xilinx Virtex2-6000&IXP 2400 


1 


GaTcch Decoder Trces/FPGA 13, 


Virtex 2-8000 


2 


WashU Bloom/FPGA 5J 


Virtex 4-100 


20.4 


Hash Function [49 


Xilinx Vertex-II Pro XC2VP70 


2 


Hash Function and CRC [30, 


Xilinx Vertex2 


2.712 - 4.560 


TCAM/Network Processor [38 


Network Processor IXDP28xx 22J 


10 



multi-transition access by DTs to process input charac- 
ter. The construction of D 2 FA from DFA is NP-hard. 
Therefore, they introduce heuristic algorithms to find 
D 2 FA with balancing between the depth of DTs and 
the memory consumption for D 2 FA. D 2 FA construc- 
tion heuristic based upon maximum weight spanning 
tree creates long default paths [2"5] . 

In [27j . which is also by Kumar et al., a new repre- 
senting for regexp has been developed as an alternative 
to D 2 FA which has the property of being compressed 
from D 2 FA and improve the ability of processing multi 
DTs to handle input characters by introducing more in- 
formation in state identifiers. Content-addressed D 2 FA 
CD 2 FA replaced state identifiers with content labels 
that include part of information that would normally 
be stored in table entry for the state. The main idea of 
CD 2 FA is exploit the D 2 FA compaction to DFA but on 
the other hand is to overcome the multi TDs travers- 
ing to manipulate the input. Notwithstanding, CD 2 FA 
need to increase the size of the states label to hold more 
information about the next state and DTs. So that, 
there are two objectives to satisfied: First, to ensure 
that states have few labeled transitions. Second, to 
ensure that default paths are as small as possible. 

According to experimental evaluation, CD 2 FA go 
beyond uncompressed DFA. Furthermore, CD 2 FA with 




Figure 5. Aho-Corasick DFA for patterns 
"he", "she", "his", and "her", we did not in- 
clude all failure edges for simplicity. 



state. Manipulating multiple DTs means that multiple 
memory accesses are required which decrease the DPI 
process throughput. However, the they (i.e., Kumar et 
al.) found that applying D 2 FA can reduce the memory 
usage dramatically about 95% which helps to imple- 
ment DPI in an On-chip memory and that leads to high 
bandwidth in memory access and decreases the effect of 
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(a) Aho-Corasick finite state machine (b) Compressed AC 

Figure 4. Compressed AC for high speed DPI 



1KB cache achieves double throughput than uncom- 
pressed DFA and with 10% of memory requirement. 



Comparison between Existing Mod- 
ules and Implementations 



Aho- Corasick Algorithm: Aho- Corasick Algo- 
rithm (AC) pQ is one of the well known algorithms for 
multi-string (patterns) matching by encoding intruder 
patterns in FSM in a preprocessing phase. After that, 
the generated FSM has root state which represent that 
no string have been matched or even partially matched 
and all patterns characters enumerated from root. If 
any pattern has same prefix, it means that the pattern 
shares a common prefix also with the corresponding 
set of parent nodes in the tier. Figure [5] shows a ex- 
ample of the AC FSM construction for patterns "he", 
"she", "his", and "her". However, AC construction is 
memory consumption as a result of the huge number of 
failed transitions that proportional with the number of 
patterns in FSM. Thus, classical AC takes more storage 
than it is likely to fit in a on-chip SRAM or the cache 
of a processor [33] . 



Additionally, In [3], Mansoor et al. constructed a 
compressed finite state machine that encodes all the in- 
trusion patterns and makes state transitions on multi- 
ple (at most k) input characters. Therefore, they start 



constructing Aho-Corasick DFA as in Figure [4(a) then 



they create an equivalent state machine called the com- 
pressed DFA as illustrated in Figure |4(b)| where it has 
transitions on multiple input characters by combining 
k consecutive states of Aho-Corasick DFA. Conversely, 
in [30], Lin et al. proposed a new construction for 
AC by splitting the input character to bits and con- 
structing small blocks that represent portion of rules 
with portion of bits for each rule. This construction 
exploits a speedy on-chip memory to upload the small 
block of the system and speed up the overall system 
throughput. 



In this section, we introduce a comparison between 
recent applied IDS with different hardware implemen- 
tations. Our comparison focuses on the algorithm, type 
of hardware implementations which are used in design- 
ing the DPI architecture and the resulting through- 
put as illustrated in Table [T] However, other related 
properties including the required memory and other 
specifications might be referred in the corresponding 
reference. 



7 Conclusion 



In this paper, we introduced a survey on some of 
the existing and on-going research works on DPI. Our 
survey included the challenges and ultimate goals be- 
hind the design of the the DPI and its implementa- 
tions. Also, we introduced an overview of the exist- 
ing implementations including both the software and 
hardware. As the finite state machine (or automata) 
is an important component of the hardware design, we 
considered the its different classified types and the on- 
going research being performed on each type. Finally, 
we introduced a concluding comparison between the 
existing modules and hardware implementations and 
relating this comparison to the achieved throughput. 

We believe that this area of research is still active 
and several works need to be performed on the different 
sides of the implementation (hardware and software) 
in addition to the design of fast matching algorithms 
that fit to the increasing demanded throughputs. Our 
survey is the first step for putting the readers into the 
the DPI systems and the open research topics in the 
field. 
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